Objective: To forecast a real time series using ETS and ARIMA models.

set.seed(32296622)
myseries <- aus_retail |>
  
  filter(!(`Series ID` %in% c("A3349561R","A3349883F","A3349499L","A3349902A",
                        "A3349588R","A3349763L","A3349372C","A3349450X",
                        "A3349679W","A3349378T","A3349767W","A3349451A"))) |>
  filter(`Series ID` == sample(`Series ID`,1))

1 Statistical features of the original data.

1.1 Data Description

The data set used is a subset of the aus_retail data which contains one numeric variable, i.e. Turnover in $Million AUD. aus_retail is a part of tsibbledata package and can also be loaded with the package fpp3 which contains total 9 packages including tsibbledata.

The data is a time series of class tsibble and the source for the same is Australian Bureau of Statistics, catalogue number 8501.0, table 11.(Robjhyndman, n.d.).

The sample data set contains the Turnover data for the ‘hardware, building and garden supplies retailing’ for the state of ‘South Australia’ for the years ‘1982 to 2018’ (calculated each month from April 1982 to December 2018) and contains 441 observations and 5 columns.

The dataset contain the following variables and table 1.1 displays the variable type as well.

data_dict <- data.frame(
  S.No. = c("1", "2", "3", "4", "5"),
  Variables = c ("State", "Industry", "Series ID", "Month", "Turnover" ),
  DataType = c("Character", "Character", "Character", "Month", "Numeric"))


knitr::kable ( data_dict ,caption = "Data Dictonary") |> 
  kable_styling(latex_options = c("striped", "hold_position")) |> 
  kable_paper("hover", full_width = T) |> 
  scroll_box(width = "100%", height = "300px")
Table 1.1: Data Dictonary
S.No. Variables DataType
1 State Character
2 Industry Character
3 Series ID Character
4 Month Month
5 Turnover Numeric

1.2 Data view

myseries
## # A tsibble: 441 x 5 [1M]
## # Key:       State, Industry [1]
##    State           Industry                        `Series ID`    Month Turnover
##    <chr>           <chr>                           <chr>          <mth>    <dbl>
##  1 South Australia Hardware, building and garden … A3349501L   1982 Apr      8.6
##  2 South Australia Hardware, building and garden … A3349501L   1982 May      9.5
##  3 South Australia Hardware, building and garden … A3349501L   1982 Jun      8.4
##  4 South Australia Hardware, building and garden … A3349501L   1982 Jul     10.3
##  5 South Australia Hardware, building and garden … A3349501L   1982 Aug     10.6
##  6 South Australia Hardware, building and garden … A3349501L   1982 Sep     11.5
##  7 South Australia Hardware, building and garden … A3349501L   1982 Oct     10.8
##  8 South Australia Hardware, building and garden … A3349501L   1982 Nov     13.1
##  9 South Australia Hardware, building and garden … A3349501L   1982 Dec     25.4
## 10 South Australia Hardware, building and garden … A3349501L   1983 Jan      9.2
## # ℹ 431 more rows

1.3 Simple Statistic features

1.3.1 Yearly average of Turnover

Table 1.2 displays an yearly average turnover in $Million AUD.

year_average <- myseries |> 
  mutate(year=year(Month)) |>
  index_by(year) |> 
  summarise(Total_turnover= mean(Turnover))

knitr::kable ( year_average ,caption = "Yearly Turnover Summary") |> 
  kable_styling(latex_options = c("striped", "hold_position")) |> 
  kable_paper("hover", full_width = T) |> 
  scroll_box(width = "100%", height = "300px")
Table 1.2: Yearly Turnover Summary
year Total_turnover
1982 12.02222
1983 11.17500
1984 13.39167
1985 14.46667
1986 12.91667
1987 14.03333
1988 16.21667
1989 20.43333
1990 23.45000
1991 23.94167
1992 28.28333
1993 27.84167
1994 28.45833
1995 27.71667
1996 31.39167
1997 34.41667
1998 39.23333
1999 42.93333
2000 55.35833
2001 70.86667
2002 74.66667
2003 70.90833
2004 69.04167
2005 62.55833
2006 67.86667
2007 75.57500
2008 96.25000
2009 88.49167
2010 85.64167
2011 85.88333
2012 70.88333
2013 66.82500
2014 66.57500
2015 79.94167
2016 82.10833
2017 92.55833
2018 98.35000

1.3.2 Overall mean

myseries |>
  features(Turnover, list(mean = mean)) |>
  arrange(mean)
## # A tibble: 1 × 3
##   State           Industry                                          mean
##   <chr>           <chr>                                            <dbl>
## 1 South Australia Hardware, building and garden supplies retailing  51.1

1.3.3 Quantile distribution

Here the data has been divided into 4 equal section, each containing 25% of the data.

myseries |> 
  features(Turnover, quantile)
## # A tibble: 1 × 7
##   State           Industry                         `0%` `25%` `50%` `75%` `100%`
##   <chr>           <chr>                           <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 South Australia Hardware, building and garden …   8.4  24.6  53.5  76.7   124.

1.4 Statistical plots for data

Figure 1.1 shows that both the ends of this Q-Q plot deviates, and hence, has a fat tail but its center also does not follow a typically straight line. This means that the plot has many extreme values and the values vary throughout as well.

ggplot(myseries, aes(sample = Turnover)) +
  geom_qq() + 
  geom_qq_line(color = "#6C03A0")+
  ylab("Sample Turnover Values ($Million AUD)")+
  xlab("Theoretical Values")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
      axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Quantile-Quantile Plot

Figure 1.1: Quantile-Quantile Plot

Figure 1.2 is a histogram and is somewhat similar to Bi-modal data, i.e., it represents 2 peaks. Here, it seems that both peaks have similar densities and the curve is sinusoidal.

  ggplot(myseries, aes(x=Turnover)) +
  geom_histogram(aes(y=after_stat(density)), fill = "#6EBA6B")+
  geom_density(color = "#6C03A0") +
  ylab("Density")+
  xlab("Turnover ($Million AUD)")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Distribution of Turnover

Figure 1.2: Distribution of Turnover

1.4.1 Exploring Time-Series Features

Figure 1.3 shows that:

  • Trend: There is somewhat an increasing trend visible in the data set.

  • Seasonality: There seems to be multiplicative seasonality as there is a change in height of the values with time. There are strong peaks observed at the end of each year.

myseries |> 
  autoplot(Turnover, color= "#C63F2C")+
  xlab("Year/Month")+
  ylab("Turnover ($Million AUD)")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
       axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Time Series

Figure 1.3: Time Series

According to figure 1.4,the highest turnovers are observed in December in each year and February has vales towards the lower end. This is possible due to the fact that hardware, building and garden work are outdoor tasks and is preferred to be done during summers and summer rises in December in Australia and starts to end by February.

myseries |> 
  gg_season(Turnover, 
            labels = "both")+
  ylab("Turnover ($Million AUD)")+
   scale_colour_viridis_c()+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
    axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Seasonal Plot

Figure 1.4: Seasonal Plot

In figure 1.5 the blue line shows the mean value for each month for all years. This plot confirms our observation from figure 1.4 that the highest turnover was observed in December and lowest in February.

myseries |> 
  gg_subseries(Turnover, color= "#156A06")+
  ylab("Turnover ($Million AUD)")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
      axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Subseries Plot

Figure 1.5: Subseries Plot

Below it is seen that a high value is present for the seasonal features.

myseries |>
  features(Turnover, feat_stl)
## # A tibble: 1 × 11
##   State        Industry trend_strength seasonal_strength_year seasonal_peak_year
##   <chr>        <chr>             <dbl>                  <dbl>              <dbl>
## 1 South Austr… Hardwar…          0.991                  0.872                  9
## # ℹ 6 more variables: seasonal_trough_year <dbl>, spikiness <dbl>,
## #   linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>

Here in figure 1.6, the colours indicate different months on the vertical axis. Most of the lag plots have shown positive correlation and hence, confirming strong seasonality in the data.

myseries |> 
  gg_lag(Turnover, lags=1:24, geom='point', size=0.5)+
  facet_wrap(~ .lag, ncol=6)+
    ylab("Turnover ($Million AUD)")+
  xlab("Lag")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
       axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Lag plot

Figure 1.6: Lag plot

Autocorrelation is shown in figure 1.7, i.e. a linear relationship between lagged values of time series. It shows that:

  • There are peaks observed at every 12th lag, i.e. at the end of each month and hence, there is seasonality.This is also suggested by the fact that an equivalent amount of dip is observed (from the previous lag) at equal intervals.

  • The *Scalloped” shape observed is also due to seasonality.

  • This data has a trend as autocorrelations for the small lags are large and positive (as observations close in time are also close in value) and these positive values slowing decreases as lags increase.

myseries |> 
ACF(Turnover, lag_max = 48) |> 
  autoplot()+
  ylab("ACF")+
  xlab("Lag [1 Month]")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
ACF

Figure 1.7: ACF

Below is a summary of seven autocorrelation features (in this order): first autocorrelation feature of the original data, sum of squares of first 10 coefficients from original data, first autocorrelation feature from differenced data, sum of squares of first 10 coefficients from differenced data, first autocorrelation feature from twice differenced data, sum of squares of first 10 coefficients from twice differenced data and autocorrelation coefficient at first seasonal lag.

myseries |> 
  features(Turnover, feat_acf)
## # A tibble: 1 × 9
##   State       Industry  acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10
##   <chr>       <chr>    <dbl> <dbl>      <dbl>       <dbl>      <dbl>       <dbl>
## 1 South Aust… Hardwar… 0.957  7.84     -0.143      0.0904     -0.501       0.267
## # ℹ 1 more variable: season_acf1 <dbl>

2 Transformations and differencing used (including unit-root test).

Transformations and difference is done to make the data stationary. There are 3 steps for this process:

  • Check the changing variance in the data, if visible, transform the data with an appropriate lambda value. Evaluate if the data is stationary and if no, move to the next step.

  • Seasonal difference to be carried out to remove seasonality, if the data is seasonal. This step also removes trend sometimes. Evaluate if the data is stationary and if no, move to the next step.

  • Perform regular difference to remove any trend and anything else leftover.

2.1 Step 1: Transformation

Figure 2.1 shows a time series by transforming turnover to log(turnover) and it seems that there is still some trend and seasonality in the data, so this is not an appropriate transformation.

myseries |> 
  autoplot(log(Turnover))+
  ylab("Log Turnover")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Log Transformation

Figure 2.1: Log Transformation

Therefore, performing the Guerrero test ahead to find out lambda value for box-cox transformation.

lambda<- myseries |> 
  features(Turnover, features = guerrero)
lambda <- pull(lambda)

lambda
## [1] 0.4662629

In figure 2.2 box-cox transformation is performed on turnover with lambda= 0.4662629 and it still seems to have trend and seasonal pattern. The data is not stationary.

myseries |> 
  autoplot(box_cox(Turnover, lambda))+
  ylab("Transformed Turnover (lambda=0.467)")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Box-Cox Transformation

Figure 2.2: Box-Cox Transformation

Figure 2.3 shows the time plot and ACF together and adds a new plot to the report, i.e. PACF (will be used in section 3).

myseries |> 
  gg_tsdisplay(box_cox(Turnover, lambda), plot_type = "partial")
Timeplot, ACF, PACF with Transform

Figure 2.3: Timeplot, ACF, PACF with Transform

There are still features of some trend and seasonality visible in the ACF, i.e. data is not stationary and therefore, requires seasonal difference.

2.1.1 Step 2: Seasonal Difference

Confirming conclusion from figure 2.3 using unit root KPSS test which defines Null Hypothesis Ho= Data is Stationary.

myseries |> 
  features(box_cox(Turnover, lambda), unitroot_kpss)
## # A tibble: 1 × 4
##   State           Industry                                 kpss_stat kpss_pvalue
##   <chr>           <chr>                                        <dbl>       <dbl>
## 1 South Australia Hardware, building and garden supplies …      6.91        0.01

Here the p value is 0.01, which actually means that the value is < 0.01 (due to a limit applied) If p < 0.05, we need to difference the data. Hence, Ho is rejected.

Confirming if a seasonal difference is needed.

myseries |> 
  features(box_cox(Turnover, lambda), unitroot_nsdiffs)
## # A tibble: 1 × 3
##   State           Industry                                         nsdiffs
##   <chr>           <chr>                                              <int>
## 1 South Australia Hardware, building and garden supplies retailing       1

It is shown that 1 seasonal difference is needed.

In figure 2.3, ACF shows that, the there is seasonality and is a monthly lag data, and hence, the lag is used as 12 for the first seasonal difference.

myseries |> 
  gg_tsdisplay(difference(box_cox(Turnover, lambda), lag=12), plot_type = "partial")
After Transforming and Seasonal Differencing

Figure 2.4: After Transforming and Seasonal Differencing

From 2.4 can be said stationary as it is observed that:

  • In the ACF, the data falls to 0 quickly after the 11th lag.

  • The time plot shows unrelated values, i.e. not varying with time.

  • There seems to be no seasonality or trend left in the data.

This is confirmed by unit root KPSS tests below:

myseries |> 
  features(difference(box_cox(Turnover, lambda), lag=12), unitroot_kpss)
## # A tibble: 1 × 4
##   State           Industry                                 kpss_stat kpss_pvalue
##   <chr>           <chr>                                        <dbl>       <dbl>
## 1 South Australia Hardware, building and garden supplies …     0.155         0.1

Here the p value is 0.1 (or > 0.1) and can be said stationary.

Further checking if more difference is required:

myseries |> 
  features(difference(box_cox(Turnover, lambda), lag=12), unitroot_ndiffs)
## # A tibble: 1 × 3
##   State           Industry                                         ndiffs
##   <chr>           <chr>                                             <int>
## 1 South Australia Hardware, building and garden supplies retailing      0

This shows that no more difference is needed, i.e. no need to perform step 3 of evaluating a regular difference. Hence, the data is now stationary.

3 Methodology used to create a short-list of appropriate ARIMA models and ETS models. (Includes AIC values as well as results from applying the models to a test-set consisting of the last 24 months of the data provided).

3.1 ETS

For this a test and a train data set is prepared with test set being for the last 24 years, i.e. 2017 and 2018 and rest being the training set.

test <- myseries |> 
  filter(Month >= yearmonth("2017 Jan"))

train <- myseries |> 
  filter(Month < yearmonth("2017 Jan"))

3.1.1 Shortlist of ETS models with results

Observing figure 2.2, we can say that error and seasonality are can be additive or multiplicative. There is somewhat an upward trend which can be called as additive. The appropriate models could be (MAM), (MNM) or (MAdM). Also here the seasonality component we can also use the additive component to the seasonality which leads to ETS models (M,N,A) or (A,Ad,A)

ets_fit <- train |> 
  model(MAM= ETS(box_cox(Turnover, lambda)~ error("M")+ trend("A")+ season("M")),
        MNM= ETS(box_cox(Turnover, lambda)~ error("M")+ trend("N")+ season("M")),
        MAdM= ETS(box_cox(Turnover, lambda)~ error("M")+ trend("Ad")+ season("M")),
        MNA= ETS(box_cox(Turnover, lambda)~ error("M")+ trend("N")+ season("A")),
        MAdM= ETS(box_cox(Turnover, lambda)~ error("A")+ trend("Ad")+ season("A")),
        ets_auto= ETS(box_cox(Turnover, lambda))
        )

glance(ets_fit)
## # A tibble: 5 × 11
##   State     Industry .model  sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
##   <chr>     <chr>    <chr>    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 South Au… Hardwar… MAM    0.00241   -933. 1900. 1901. 1968. 0.215 0.289 0.0363
## 2 South Au… Hardwar… MNM    0.00240   -930. 1890. 1892. 1951. 0.209 0.287 0.0365
## 3 South Au… Hardwar… MAdM   0.182     -894. 1825. 1827. 1897. 0.175 0.254 0.323 
## 4 South Au… Hardwar… MNA    0.00214   -906. 1842. 1843. 1902. 0.176 0.251 0.0344
## 5 South Au… Hardwar… ets_a… 0.180     -893. 1816. 1818. 1877. 0.174 0.253 0.322

Here the results are discussed for the shortlisted 3 models and the automatic model chosen. It is observed that the AICc value is the lowest for the automatic ETS model created.

ets_fit |> 
  select(ets_auto) |> 
  report()
## Series: Turnover 
## Model: ETS(A,N,A) 
## Transformation: box_cox(Turnover, lambda) 
##   Smoothing parameters:
##     alpha = 0.6642112 
##     gamma = 0.1496237 
## 
##   Initial states:
##      l[0]       s[0]      s[-1]      s[-2]   s[-3]     s[-4]     s[-5]
##  4.294015 -0.3972145 -0.5081342 -0.3191967 2.53842 0.3365877 0.1001217
##       s[-6]       s[-7]      s[-8]      s[-9]     s[-10]     s[-11]
##  0.07167334 -0.04089306 -0.3280601 -0.6257368 -0.2663899 -0.5611781
## 
##   sigma^2:  0.18
## 
##      AIC     AICc      BIC 
## 1816.493 1817.690 1876.989

The best model reported here is (ANA)

ets_fit |> 
  forecast(h=24) |> 
  accuracy(myseries)
## # A tibble: 5 × 12
##   .model   State  Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>  <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MAM      South… Hardwar… Test   8.74  10.1  8.74  8.93  8.93  1.45  1.18 0.348
## 2 MAdM     South… Hardwar… Test  12.8   14.5 12.8  12.8  12.8   2.12  1.70 0.601
## 3 MNA      South… Hardwar… Test  12.3   14.0 12.3  12.4  12.4   2.05  1.64 0.580
## 4 MNM      South… Hardwar… Test  12.9   14.2 12.9  13.1  13.1   2.15  1.67 0.551
## 5 ets_auto South… Hardwar… Test  12.8   14.5 12.8  12.8  12.8   2.12  1.70 0.600

We observe that the models MAdM and ANA have the same results other than just one where the value for ANA is the lowest and hence we select ETS(ANA) as the best model.

3.2 ARIMA

Referring to figure 2.3, we find the (pdq) and (PDQ)12 (lag 12). We know that d=0, D=1 (seasonal difference)

For q we look at ACF plot and significant lags before 12 is only 1st, can be 2nd too. So q=1 or can be q=2. In PACF, significant lag is on 13th , so p=1 .

Hence (p,d,q)= (1,0,1) (2,0,1) (0,0,1) (2,0,0) (1,0,2) (2,0,2) (0,0,2) (2,0,0) Also, p=0 (when we only observe ACF) and q=0 (when we only observe PACF)

For seasonal values, we only look at seasonal lags, i.e. multiples of 12. Looking at ACF, Q=0 as ACF quickly decays to 0 and the first lag can be significant so Q=1. And from PACF, only lag at 12 and 24 seems to have significant lags, so P=2. Also even the first lag is not as significant so P=0

Hence (P,D,Q)= (2,1,0) (0,1,1)

Looking at the time plot, the average is not the center so a constant can be present.

3.2.1 Shortlisting of ARIMA models with results

arima_fit <- train |> 
  model(
    arima101210= ARIMA(box_cox(Turnover, lambda) ~ pdq(1,0,1) +PDQ(2,1,0)),
    arima201210= ARIMA(box_cox(Turnover, lambda) ~ pdq(2,0,1) +PDQ(2,1,0)),
    arima001210= ARIMA(box_cox(Turnover, lambda) ~ pdq(0,0,1) +PDQ(2,1,0)),
    arima201210= ARIMA(box_cox(Turnover, lambda) ~ pdq(2,0,1) +PDQ(2,1,0)),
    arima101011= ARIMA(box_cox(Turnover, lambda) ~ pdq(1,0,1) +PDQ(0,1,1)),
    arima102111= ARIMA(box_cox(Turnover, lambda) ~ pdq(1,0,2) +PDQ(1,1,1)),
    arima_auto= ARIMA(box_cox(Turnover, lambda) , trace = TRUE)
) 
## Model specification      Selection metric
## ARIMA(2,0,2)(1,1,1)[12]+c    2516.509675
## ARIMA(0,0,0)(0,1,0)[12]+c    3102.480602
## ARIMA(1,0,0)(1,1,0)[12]+c    2590.688591
## ARIMA(0,0,1)(0,1,1)[12]+c    2853.611292
## ARIMA(2,0,2)(0,1,1)[12]+c    2506.156791
## ARIMA(2,0,2)(0,1,0)[12]+c    2673.939059
## ARIMA(1,0,2)(0,1,1)[12]+c    2503.103318
## ARIMA(1,0,2)(0,1,0)[12]+c    2671.002491
## ARIMA(0,0,2)(0,1,1)[12]+c    2747.122528
## ARIMA(1,0,1)(0,1,1)[12]+c    2507.648728
## ARIMA(1,0,2)(0,1,1)[12]      2503.699989
## ARIMA(1,0,3)(0,1,1)[12]+c    2505.174378
## ARIMA(1,0,2)(0,1,2)[12]+c    2505.170039
## ARIMA(1,0,2)(1,1,1)[12]+c    2510.213316
## ARIMA(0,0,2)(0,1,0)[12]+c    2783.907628
## ARIMA(1,0,1)(0,1,0)[12]+c    2671.501861
## ARIMA(1,0,2)(0,1,0)[12]      2671.300571
## ARIMA(1,0,3)(0,1,0)[12]+c    2672.738475
## ARIMA(0,0,2)(0,1,1)[12]      2784.501687
## ARIMA(0,0,3)(0,1,1)[12]+c    2688.354160
## ARIMA(1,0,1)(0,1,1)[12]      2509.200207
## ARIMA(1,0,3)(0,1,1)[12]      2505.750114
## ARIMA(2,0,1)(0,1,1)[12]+c    2505.334854
## ARIMA(2,0,2)(0,1,1)[12]      2506.660164
## ARIMA(2,0,3)(0,1,1)[12]+c    2508.231499
## ARIMA(0,0,2)(0,1,2)[12]+c    2746.043150
## ARIMA(1,0,1)(0,1,2)[12]+c    2509.659590
## ARIMA(1,0,2)(0,1,2)[12]      2505.758751
## ARIMA(1,0,3)(0,1,2)[12]+c    2507.251548
## ARIMA(2,0,2)(0,1,2)[12]+c    2508.235563
## ARIMA(1,0,2)(1,1,0)[12]+c    2576.589825
## ARIMA(0,0,2)(1,1,1)[12]+c    2750.877321
## ARIMA(1,0,1)(1,1,1)[12]+c    2513.673285
## ARIMA(1,0,2)(1,1,1)[12]      2510.648528
## ARIMA(1,0,3)(1,1,1)[12]+c    2511.680653
## ARIMA(1,0,2)(1,1,2)[12]+c    2505.742034
## 
## --- Re-estimating best models without approximation ---
## 
## ARIMA(1,0,2)(0,1,1)[12]+c    472.388817
tidy(arima_fit)
## # A tibble: 29 × 8
##    State           Industry  .model term  estimate std.error statistic   p.value
##    <chr>           <chr>     <chr>  <chr>    <dbl>     <dbl>     <dbl>     <dbl>
##  1 South Australia Hardware… arima… ar1     0.945     0.0183     51.7  1.46e-180
##  2 South Australia Hardware… arima… ma1    -0.258     0.0612     -4.22 3.05e-  5
##  3 South Australia Hardware… arima… sar1   -0.588     0.0488    -12.0  9.63e- 29
##  4 South Australia Hardware… arima… sar2   -0.270     0.0495     -5.47 8.03e-  8
##  5 South Australia Hardware… arima… cons…   0.0308    0.0163      1.89 5.96e-  2
##  6 South Australia Hardware… arima… ar1     1.27      0.129       9.80 1.73e- 20
##  7 South Australia Hardware… arima… ar2    -0.290     0.123      -2.36 1.87e-  2
##  8 South Australia Hardware… arima… ma1    -0.557     0.111      -5.01 8.08e-  7
##  9 South Australia Hardware… arima… sar1   -0.595     0.0487    -12.2  2.14e- 29
## 10 South Australia Hardware… arima… sar2   -0.265     0.0495     -5.35 1.47e-  7
## # ℹ 19 more rows

Here the estimate parameter shows that it has some constants.

glance(arima_fit)
## # A tibble: 6 × 10
##   State       Industry .model sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots
##   <chr>       <chr>    <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>  
## 1 South Aust… Hardwar… arima…  0.208   -257.  526.  526.  550. <cpl>    <cpl>   
## 2 South Aust… Hardwar… arima…  0.207   -256.  524.  525.  548. <cpl>    <cpl>   
## 3 South Aust… Hardwar… arima…  0.444   -409.  827.  827.  847. <cpl>    <cpl>   
## 4 South Aust… Hardwar… arima…  0.182   -233.  477.  477.  497. <cpl>    <cpl>   
## 5 South Aust… Hardwar… arima…  0.180   -230.  474.  474.  502. <cpl>    <cpl>   
## 6 South Aust… Hardwar… arima…  0.179   -230.  472.  472.  496. <cpl>    <cpl>

This shows the lowest AICc value is for the automatic arima model which is just slightly lower than (102)(111) model

arima_fit |> 
  select(arima_auto) |> 
  report()
## Series: Turnover 
## Model: ARIMA(1,0,2)(0,1,1)[12] w/ drift 
## Transformation: box_cox(Turnover, lambda) 
## 
## Coefficients:
##          ar1      ma1      ma2     sma1  constant
##       0.9742  -0.2652  -0.1226  -0.7777    0.0078
## s.e.  0.0124   0.0511   0.0480   0.0331    0.0029
## 
## sigma^2 estimated as 0.1794:  log likelihood=-230.09
## AIC=472.18   AICc=472.39   BIC=496.2

The automatic ARIMA reported is (102)(011) which is the best model.

arima_fit |> 
  forecast(h=24) |> 
  accuracy(myseries)
## # A tibble: 6 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima001… Sout… Hardwar… Test   8.03  9.74  8.03  7.96  7.96  1.34  1.14 0.508
## 2 arima101… Sout… Hardwar… Test   9.46 10.7   9.46  9.63  9.63  1.57  1.26 0.372
## 3 arima101… Sout… Hardwar… Test   8.71 10.2   8.71  8.75  8.75  1.45  1.19 0.413
## 4 arima102… Sout… Hardwar… Test   9.88 11.1   9.88 10.0  10.0   1.64  1.31 0.403
## 5 arima201… Sout… Hardwar… Test  10.4  11.8  10.4  10.5  10.5   1.73  1.39 0.496
## 6 arima_au… Sout… Hardwar… Test   9.93 11.2   9.93 10.1  10.1   1.65  1.31 0.406

4 Choosing one ARIMA model and one ETS model based on this analysis and showing parameter estimates, residual diagnostics, forecasts and prediction intervals for both models. Also diagnostic checking for both models including ACF graphs and the Ljung-Box test is shown.

The best ETS model chosen is ANA and the best ARIMA model is (102)(011)

4.1 Parameter estimates

ets_fit |> 
  select(ets_auto) |> 
  tidy()
## # A tibble: 15 × 3
##    .model   term   estimate
##    <chr>    <chr>     <dbl>
##  1 ets_auto alpha    0.664 
##  2 ets_auto gamma    0.150 
##  3 ets_auto l[0]     4.29  
##  4 ets_auto s[0]    -0.397 
##  5 ets_auto s[-1]   -0.508 
##  6 ets_auto s[-2]   -0.319 
##  7 ets_auto s[-3]    2.54  
##  8 ets_auto s[-4]    0.337 
##  9 ets_auto s[-5]    0.100 
## 10 ets_auto s[-6]    0.0717
## 11 ets_auto s[-7]   -0.0409
## 12 ets_auto s[-8]   -0.328 
## 13 ets_auto s[-9]   -0.626 
## 14 ets_auto s[-10]  -0.266 
## 15 ets_auto s[-11]  -0.561
arima_fit |> 
  select(arima_auto) |> 
  tidy()
## # A tibble: 5 × 6
##   .model     term     estimate std.error statistic   p.value
##   <chr>      <chr>       <dbl>     <dbl>     <dbl>     <dbl>
## 1 arima_auto ar1       0.974     0.0124      78.8  1.13e-247
## 2 arima_auto ma1      -0.265     0.0511      -5.19 3.37e-  7
## 3 arima_auto ma2      -0.123     0.0480      -2.55 1.10e-  2
## 4 arima_auto sma1     -0.778     0.0331     -23.5  1.21e- 77
## 5 arima_auto constant  0.00779   0.00289      2.69 7.33e-  3

4.2 ETS

4.2.1 Residuals

Figure 4.1 shows residuals for best ETS model. It is observed that it is fairly normally distributed. ACF shows somewhat white noise, i.e. there are some residuals left but they are not significant.

gg_tsresiduals(ets_fit |> 
                 select(ets_auto))
Residuals ETS

Figure 4.1: Residuals ETS

4.2.2 Forecast

ets_fit  |> 
   select(State, Industry, ets_auto) |> 
  forecast(h=24) |> 
  autoplot(myseries)+
  ylab("Turnover ($Million AUD)")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Forecast for ETS best model ANA

Figure 4.2: Forecast for ETS best model ANA

ets_fit |> 
  select(ets_auto) |> 
  augment()
## # A tsibble: 417 x 6 [1M]
## # Key:       .model [1]
##    .model      Month Turnover .fitted  .resid   .innov
##    <chr>       <mth>    <dbl>   <dbl>   <dbl>    <dbl>
##  1 ets_auto 1982 Apr      8.6    8.69 -0.0899 -0.0284 
##  2 ets_auto 1982 May      9.5    9.59 -0.0883 -0.0265 
##  3 ets_auto 1982 Jun      8.4    8.37  0.0274  0.00879
##  4 ets_auto 1982 Jul     10.3    9.34  0.955   0.282  
##  5 ets_auto 1982 Aug     10.6   11.0  -0.379  -0.107  
##  6 ets_auto 1982 Sep     11.5   11.1   0.370   0.101  
##  7 ets_auto 1982 Oct     10.8   11.5  -0.679  -0.188  
##  8 ets_auto 1982 Nov     13.1   11.9   1.21    0.313  
##  9 ets_auto 1982 Dec     25.4   22.8   2.61    0.478  
## 10 ets_auto 1983 Jan      9.2   11.4  -2.21   -0.638  
## # ℹ 407 more rows

4.2.3 Prediction Intervals

ets_fit |> 
  select(ets_auto) |> 
  forecast(h=24) |> 
  hilo(level=95)
## # A tsibble: 24 x 5 [1M]
## # Key:       .model [1]
##    .model      Month       Turnover .mean                   `95%`
##    <chr>       <mth>         <dist> <dbl>                  <hilo>
##  1 ets_auto 2017 Jan t(N(15, 0.18))  82.5 [73.87755,  91.39686]95
##  2 ets_auto 2017 Feb t(N(14, 0.26))  71.4 [61.94528,  81.42036]95
##  3 ets_auto 2017 Mar t(N(14, 0.34))  79.6 [68.15389,  91.73461]95
##  4 ets_auto 2017 Apr t(N(14, 0.42))  78.2 [65.69555,  91.65149]95
##  5 ets_auto 2017 May  t(N(14, 0.5))  76.3 [62.82308,  90.74869]95
##  6 ets_auto 2017 Jun t(N(14, 0.58))  76.1 [61.65714,  91.68188]95
##  7 ets_auto 2017 Jul t(N(14, 0.66))  76.5 [61.14928,  93.27143]95
##  8 ets_auto 2017 Aug t(N(14, 0.74))  80.1 [63.44050,  98.28388]95
##  9 ets_auto 2017 Sep t(N(15, 0.82))  85.3 [67.18874, 105.11464]95
## 10 ets_auto 2017 Oct t(N(16, 0.89))  93.0 [73.15791, 114.76831]95
## # ℹ 14 more rows

4.2.4 ACF for ETS best model

Zooming in on figure 4.1, figure 4.3 suggests autocorrelation.

augment(ets_fit) |> 
  filter(.model== "ets_auto") |> 
  ACF(.innov) |>
  autoplot() +
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
ACF Diagnostics for ETS best model

Figure 4.3: ACF Diagnostics for ETS best model

4.2.5 Ljung-Box test

This test uses the Hypothesis Ho that the residuals are independently distributed.

augment(ets_fit) |> 
  filter(.model== "ets_auto") |> 
  features(.innov, ljung_box, lag=24, dof=18)
## # A tibble: 1 × 5
##   State           Industry                              .model lb_stat lb_pvalue
##   <chr>           <chr>                                 <chr>    <dbl>     <dbl>
## 1 South Australia Hardware, building and garden suppli… ets_a…    28.5 0.0000764

Here the p value is <0.05, suggesting that the values are auto correlated, rejecting the null hypothesis.

4.3 ARIMA

Figure 4.4 shows a normal distribution and white noise.

gg_tsresiduals(arima_fit |> 
                 select(arima_auto))
Residuals ARIMA

Figure 4.4: Residuals ARIMA

4.3.1 Forecast

arima_fit  |> 
   select(State, Industry, arima_auto) |> 
  forecast(h=24) |> 
  autoplot(myseries)+
  ylab("Turnover ($Million AUD)")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
 Forecast for best ARIMA model (102)(011)

Figure 4.5: Forecast for best ARIMA model (102)(011)

arima_fit |> 
  select(arima_auto) |> 
  augment()
## # A tsibble: 417 x 6 [1M]
## # Key:       .model [1]
##    .model        Month Turnover .fitted .resid  .innov
##    <chr>         <mth>    <dbl>   <dbl>  <dbl>   <dbl>
##  1 arima_auto 1982 Apr      8.6    8.59 0.0107 0.00340
##  2 arima_auto 1982 May      9.5    9.49 0.0122 0.00368
##  3 arima_auto 1982 Jun      8.4    8.39 0.0104 0.00334
##  4 arima_auto 1982 Jul     10.3   10.3  0.0136 0.00392
##  5 arima_auto 1982 Aug     10.6   10.6  0.0141 0.00400
##  6 arima_auto 1982 Sep     11.5   11.5  0.0157 0.00425
##  7 arima_auto 1982 Oct     10.8   10.8  0.0144 0.00406
##  8 arima_auto 1982 Nov     13.1   13.1  0.0184 0.00467
##  9 arima_auto 1982 Dec     25.4   25.4  0.0407 0.00725
## 10 arima_auto 1983 Jan      9.2    9.19 0.0117 0.00359
## # ℹ 407 more rows

4.3.2 Prediction Intervals

arima_fit |> 
  select(arima_auto) |> 
  forecast(h=24) |> 
  hilo(level=95)
## # A tsibble: 24 x 5 [1M]
## # Key:       .model [1]
##    .model        Month       Turnover .mean                   `95%`
##    <chr>         <mth>         <dist> <dbl>                  <hilo>
##  1 arima_auto 2017 Jan t(N(15, 0.18))  81.4 [72.85991,  90.22765]95
##  2 arima_auto 2017 Feb t(N(14, 0.27))  71.7 [62.03240,  81.92949]95
##  3 arima_auto 2017 Mar t(N(14, 0.33))  80.2 [68.87268,  92.14720]95
##  4 arima_auto 2017 Apr t(N(14, 0.38))  78.7 [66.61905,  91.51518]95
##  5 arima_auto 2017 May t(N(14, 0.43))  76.0 [63.45951,  89.51768]95
##  6 arima_auto 2017 Jun t(N(14, 0.48))  75.2 [61.98916,  89.31625]95
##  7 arima_auto 2017 Jul t(N(14, 0.53))  75.8 [62.00954,  90.76842]95
##  8 arima_auto 2017 Aug t(N(14, 0.58))  80.4 [65.57069,  96.46289]95
##  9 arima_auto 2017 Sep t(N(15, 0.62))  87.1 [71.04251, 104.44180]95
## 10 arima_auto 2017 Oct t(N(16, 0.66))  96.5 [78.98670, 115.39304]95
## # ℹ 14 more rows

4.3.3 Diagnostic test: ACF

Figure 4.6 shows that most residuals are within the blue line, showing white noise.

augment(arima_fit) |> 
  filter(.model== "arima_auto") |> 
 ACF(.innov) |>
  autoplot() +
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
ACF for best fit ARIMA model

Figure 4.6: ACF for best fit ARIMA model

4.3.4 Ljung-Box test

augment(arima_fit) |> 
  filter(.model== "arima_auto") |> 
  features(.innov, ljung_box, lag=36, dof=24)
## # A tibble: 1 × 5
##   State           Industry                              .model lb_stat lb_pvalue
##   <chr>           <chr>                                 <chr>    <dbl>     <dbl>
## 1 South Australia Hardware, building and garden suppli… arima…    36.7  0.000248

Here, p value is <0.05 rejecting the Null Hypothesis.

5 Comparison of the results from each of the preferred models and the best method for better forecasts (reference to the test-set).

5.1 Comparison

  • Both the models show somewhat a normal distribution and white noise in the data (figure 4.1 and 4.4.

  • The forecasts are similar for both models, but when augmented, the ETS has more negative values in .resid and .innov.

  • The prediction intervals in the ARIMA model are narrower than the ETS model.

  • Both show a value <0.05 in the Ljung test.

5.2 Best method for better forecast for test set

both_best <- train |> 
  model(
  ets_auto= ETS(box_cox(Turnover, lambda)),
arima_auto= ARIMA(box_cox(Turnover, lambda))
)

both_best |> 
  forecast(h=24) |> 
  autoplot(test)+
    ylab("Turnover ($Million AUD)")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Best model forecast for test set

Figure 5.1: Best model forecast for test set

Comparing figure 5.1 with 4.2 which is the overall forecast and comparison between both best models too, it shows that prediction intervals are narrow in best ARIMA.

both_best |> 
  forecast(h=24) |> 
  accuracy(myseries)
## # A tibble: 2 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima_au… Sout… Hardwar… Test   9.93  11.2  9.93  10.1  10.1  1.65  1.31 0.406
## 2 ets_auto  Sout… Hardwar… Test  12.8   14.5 12.8   12.8  12.8  2.12  1.70 0.600

The accuracy values are lower for best arima model, suggesting better forecasts.

6 Applying the two chosen models to the full data set, re-estimating the parameters but not changing the model structure. Producing out-of-sample point forecasts and 80% prediction intervals for each model for two years past the end of the data provided.

6.1 Dataset on full data

both_best2 <- myseries |> 
  model(
  ets_ANA= ETS(box_cox(Turnover, lambda)),
arima_102011= ARIMA(box_cox(Turnover, lambda))
)

6.2 Re-estimating parameters

6.2.1 Estimates

both_best2 |> 
  tidy()
## # A tibble: 20 × 8
##    State           Industry .model term  estimate std.error statistic    p.value
##    <chr>           <chr>    <chr>  <chr>    <dbl>     <dbl>     <dbl>      <dbl>
##  1 South Australia Hardwar… ets_A… alpha  0.659    NA           NA    NA        
##  2 South Australia Hardwar… ets_A… gamma  0.159    NA           NA    NA        
##  3 South Australia Hardwar… ets_A… l[0]   4.28     NA           NA    NA        
##  4 South Australia Hardwar… ets_A… s[0]  -0.406    NA           NA    NA        
##  5 South Australia Hardwar… ets_A… s[-1] -0.499    NA           NA    NA        
##  6 South Australia Hardwar… ets_A… s[-2] -0.314    NA           NA    NA        
##  7 South Australia Hardwar… ets_A… s[-3]  2.53     NA           NA    NA        
##  8 South Australia Hardwar… ets_A… s[-4]  0.321    NA           NA    NA        
##  9 South Australia Hardwar… ets_A… s[-5]  0.0878   NA           NA    NA        
## 10 South Australia Hardwar… ets_A… s[-6]  0.0627   NA           NA    NA        
## 11 South Australia Hardwar… ets_A… s[-7] -0.0421   NA           NA    NA        
## 12 South Australia Hardwar… ets_A… s[-8] -0.317    NA           NA    NA        
## 13 South Australia Hardwar… ets_A… s[-9] -0.604    NA           NA    NA        
## 14 South Australia Hardwar… ets_A… s[-1… -0.256    NA           NA    NA        
## 15 South Australia Hardwar… ets_A… s[-1… -0.559    NA           NA    NA        
## 16 South Australia Hardwar… arima… ar1    0.972     0.0124      78.6   6.60e-257
## 17 South Australia Hardwar… arima… ma1   -0.254     0.0496      -5.13  4.39e-  7
## 18 South Australia Hardwar… arima… ma2   -0.143     0.0461      -3.09  2.14e-  3
## 19 South Australia Hardwar… arima… sma1  -0.767     0.0326     -23.5   2.81e- 79
## 20 South Australia Hardwar… arima… cons…  0.00889   0.00289      3.07  2.27e-  3

6.2.2 Residuals

gg_tsresiduals(both_best2 |> 
  select(ets_ANA))+
  ggtitle("ETS ANA Residuals on full dataset")

gg_tsresiduals(both_best2 |> 
  select(arima_102011))+
  ggtitle("ARIMA (102)(011) Residuals on full dataset")

6.2.3 Model diagnostics

both_best2 |> 
  select(ets_ANA) |> 
  augment()
## # A tsibble: 441 x 6 [1M]
## # Key:       .model [1]
##    .model     Month Turnover .fitted  .resid   .innov
##    <chr>      <mth>    <dbl>   <dbl>   <dbl>    <dbl>
##  1 ets_ANA 1982 Apr      8.6    8.64 -0.0399 -0.0126 
##  2 ets_ANA 1982 May      9.5    9.60 -0.0961 -0.0288 
##  3 ets_ANA 1982 Jun      8.4    8.41 -0.0113 -0.00364
##  4 ets_ANA 1982 Jul     10.3    9.32  0.977   0.289  
##  5 ets_ANA 1982 Aug     10.6   10.9  -0.323  -0.0909 
##  6 ets_ANA 1982 Sep     11.5   11.1   0.416   0.114  
##  7 ets_ANA 1982 Oct     10.8   11.4  -0.650  -0.180  
##  8 ets_ANA 1982 Nov     13.1   11.9   1.22    0.318  
##  9 ets_ANA 1982 Dec     25.4   22.8   2.61    0.478  
## 10 ets_ANA 1983 Jan      9.2   11.5  -2.26   -0.652  
## # ℹ 431 more rows
both_best2 |> 
  select(arima_102011) |> 
  augment()
## # A tsibble: 441 x 6 [1M]
## # Key:       .model [1]
##    .model          Month Turnover .fitted .resid  .innov
##    <chr>           <mth>    <dbl>   <dbl>  <dbl>   <dbl>
##  1 arima_102011 1982 Apr      8.6    8.59 0.0107 0.00338
##  2 arima_102011 1982 May      9.5    9.49 0.0122 0.00366
##  3 arima_102011 1982 Jun      8.4    8.39 0.0103 0.00332
##  4 arima_102011 1982 Jul     10.3   10.3  0.0135 0.00390
##  5 arima_102011 1982 Aug     10.6   10.6  0.0140 0.00398
##  6 arima_102011 1982 Sep     11.5   11.5  0.0156 0.00423
##  7 arima_102011 1982 Oct     10.8   10.8  0.0144 0.00404
##  8 arima_102011 1982 Nov     13.1   13.1  0.0184 0.00465
##  9 arima_102011 1982 Dec     25.4   25.4  0.0406 0.00723
## 10 arima_102011 1983 Jan      9.2    9.19 0.0117 0.00357
## # ℹ 431 more rows

6.2.4 Prodocuing out of sample forecasts

Figure 6.1 shows forecast of two best models on full data set with 80% prediction interval.

both_best2 |> 
  forecast(h=24) |> 
  autoplot(myseries, level=80)+
    ylab("Turnover ($Million AUD)")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
80% interval forecast by best models

Figure 6.1: 80% interval forecast by best models

7 Obtaining up-to-date data from the ABS website. - using the previous release of data, rather than the latest release and comparing above forecasts with the actual numbers.

abs_retail_data <- read_abs( "8501.0", tables = "11")
abs <- abs_retail_data |> 
  separate(series, into = c("extra","State","Industry"),sep = ";") |> 
    mutate(Month= yearmonth(date),
           State = trimws(State),
           Industry=trimws(Industry)) |> 
    select(Month, State, value, Industry) |> 
  filter(State == "South Australia") |> 
  filter(Industry== "Hardware, building and garden supplies retailing") |> 
  rename(Turnover=value) |> 
  filter(Month > yearmonth("2018 Dec")) 

Comparing figures 7.1 and 7.2, we see that ETS model has given lagged values whereas arima model values are closer to the actual values.

abs |> 
  ggplot(aes(x=Month, y=Turnover))+
  geom_line()+
    ylab("Turnover ($Million AUD)")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
ABS data plot

Figure 7.1: ABS data plot

both_best2 |> 
  forecast(h=51) |> 
  autoplot()+
    ylab("Turnover ($Million AUD)")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Forecast from myseries data

Figure 7.2: Forecast from myseries data

The ABS data contain 51 more months than the myseries data. And hence, forecasting is performed for 51 months in figure 7.2.

Figure 7.3 below shoes the residual values for both models, when compared to original ABS data.

both_best2 |> 
  forecast(h=51) |> 
  mutate(Turnover.ABS= rep(abs$Turnover,times=2)) |> 
  #we have both forecast-ed and actual values here
  mutate(.resid=Turnover.ABS-.mean) |>  
  #difference between ABS values and mean from forecast-ed values
  as_tsibble() |> 
  select(-c(State, Industry)) |> 
  autoplot(.resid)+
    ylab("Turnover ($Million AUD)")+
  xlab("Year Month")+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Difference: ABS orignal vales - mean of forecasted values

Figure 7.3: Difference: ABS orignal vales - mean of forecasted values

In figure 7.4, ACF for ETS seems to have almost 0 value lags after it decays at 12.

both_best2 |> 
  forecast(h=51) |> 
  mutate(Turnover.ABS= rep(abs$Turnover,times=2)) |> 
  mutate(.resid=Turnover.ABS-.mean) |> 
  as_tsibble() |> 
  ACF(.resid) |> 
  autoplot()+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
ACF of residuals

Figure 7.4: ACF of residuals

Figure 7.5 shows that both the models form a somewhat bi nodal histograms but it it more clear in the case of arima.

both_best2 |> 
  forecast(h=51) |> 
  mutate(Turnover.ABS= rep(abs$Turnover,times=2)) |> 
  mutate(.resid=Turnover.ABS-.mean) |> 
  as_tsibble() |> 
ggplot(aes(x = .resid)) +
  geom_histogram(aes(y=after_stat(density)), fill = "#6EBA6B")+
  geom_density(color = "#6C03A0")+
  facet_grid(~.model)+
  theme_minimal()+
   theme( plot.background = element_rect(fill = "#FFF8F7"),
    panel.grid.major.y = element_blank(),
        axis.text.x = element_text(color="#993333", angle=45, size=8),
    axis.text.y = element_text(colour= "#08007F", size=8))
Histogram of residuals

Figure 7.5: Histogram of residuals

8 Benefits and limitations of the models for the data.

  • The ETS model chosen was ANA and ARIMS model chosen was (102)(011).

  • ETS models are considered to be stationary and ARIMA models are considered non-stationary.

  • ARIMA models fits the training data slightly better than ETS.

  • ETS modeling gives more significance to recent observation.

  • ETS was unpredictable as it predicted ANA and could not handle trend well.

  • The value of ETS forecasts when compared to the actual values, were lagging behind as compared to ARIMA forecasts.

  • Outliers affect ARIMA modeling results and hence a duplicate test was done before the modeling process.

Below the accuracy for both models are compared with reference to the test set and it is see that ARIMA has better values and therefore, slightly more accurate.

bind_rows(
    arima_fit |> select(arima_auto) |> accuracy(),
    ets_fit |>  select(ets_auto) |> accuracy(),
  arima_fit |> select(arima_auto) |> forecast(h=10) |> accuracy(test),
    ets_fit |>  select(ets_auto) |> forecast(h=10) |>  accuracy(test)
  )
## # A tibble: 4 × 10
##   .model     .type         ME  RMSE   MAE    MPE  MAPE    MASE   RMSSE   ACF1
##   <chr>      <chr>      <dbl> <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>  <dbl>
## 1 arima_auto Training -0.0319  3.58  2.52 -0.230  5.58   0.419   0.421 0.0618
## 2 ets_auto   Training  0.257   3.62  2.59  0.437  5.75   0.430   0.425 0.0887
## 3 arima_auto Test      7.96    9.32  7.96  8.64   8.64 NaN     NaN     0.135 
## 4 ets_auto   Test      8.35   10.2   8.35  8.96   8.96 NaN     NaN     0.176

9 References

Australian retail trade turnover. rdrr.io. https://rdrr.io/cran/tsibbledata/man/aus_retail.html

Chang, W. (2023, May 16). 13.13 Creating a QQ Plot | R Graphics Cookbook, 2nd edition. https://r-graphics.org/recipe-miscgraph-qq

Coder, R. (2021). Histogram with density in ggplot2. R CHARTS | a Collection of Charts and Graphs Made With the R Programming Language. https://r-charts.com/distribution/histogram-density-ggplot2/

Forecasting: Principles and Practice (3rd ed). (n.d.). https://otexts.com/fpp3/

Holtz, Y. (n.d.). The R Graph Gallery – Help and inspiration for R charts. The R Graph Gallery. https://r-graph-gallery.com/

Retail Trade, Australia, March 2023. (2023, May 9). Australian Bureau of Statistics. https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release

Robjhyndman. (n.d.). fpp3package/README.Rmd at master · robjhyndman/fpp3package. GitHub. https://github.com/robjhyndman/fpp3package/blob/master/README.Rmd

Roy, A. (2023). What is Sales Forecasting? CX Today. https://www.cxtoday.com/contact-centre/what-is-sales-forecasting/